首页> 外文OA文献 >Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

【2h】

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

机译：Kepler架构nVidia GpU上格子Boltzmann解算器的内存传输优化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as ‘Kepler’. We provide a review of previous optimization strategies and analyse data read/write times for different memory types. In LBM, the time propagation step (known as streaming), involves shifting data to adjacent locations and is central to parallel performance; here we examine three approaches which make use of different hardware options. Two of which make use of ‘performance enhancing’ features of the GPU; shared memory and the new shuffle instruction found in Kepler based GPUs. These are compared to a standard transfer of data which relies instead on optimized storage to increase coalesced access. It is shown that the more simple approach is most efficient; since the need for large numbers of registers per thread in LBM limits the block size and thus the efficiency of these special features is reduced. Detailed results are obtained for a D3Q19 LBM solver, which is benchmarked on nVidia K5000M and K20C GPUs. In the latter case the use of a read-only data cache is explored, and peak performance of over 1036 Million Lattice Updates Per Second (MLUPS) is achieved. The appearance of a periodic bottleneck in the solver performance is also reported, believed to be hardware related; spikes in iteration-time occur with a frequency of around 11 Hz for both GPUs, independent of the size of the problem.

机译：由于算法中普遍存在局部运算，因此解决流体流动的莱迪思玻尔兹曼方法（LBM）自然非常适合大规模并行计算的高效实现。本文介绍并分析了针对第三代nVidia GPU硬件（也称为“ Kepler”）优化的3D格子Boltzmann求解器的性能。我们回顾了以前的优化策略，并分析了不同内存类型的数据读/写时间。在LBM中，时间传播步骤（称为流传输）涉及将数据移至相邻位置，并且对并行性能至关重要。在这里，我们研究了三种使用不同硬件选项的方法。其中两个利用了GPU的“性能增强”功能；共享内存和基于开普勒的GPU中发现的新的随机播放指令。将这些与标准数据传输进行比较，该数据传输依赖于优化的存储来增加合并的访问。结果表明，更简单的方法是最有效的。由于LBM中每个线程需要大量寄存器，这限制了块的大小，因此降低了这些特殊功能的效率。获得了D3Q19 LBM求解器的详细结果，该求解器在nVidia K5000M和K20C GPU上进行了基准测试。在后一种情况下，探索了使用只读数据缓存的方法，并且峰值性能达到了每秒10.36亿个晶格更新（MLUPS）。还报告了求解器性能中周期性瓶颈的出现，据信这与硬件有关。对于两个GPU，迭代时间的峰值都以11 Hz左右的频率发生，与问题的大小无关。

著录项

作者
Mawson, M; Revell, A.;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs [J] . Mark J. Mawson, Alistair J. Revell Computer physics communications . 2014,第10期

机译：开普勒架构nVidia GPU上晶格Boltzmann求解器的内存传输优化
2. Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA [J] . J. Habich, T. Zeiser, G. Hager, Advances in Engineering Software . 2011,第5期

机译：使用CUDA在nVIDIA GPU上D3Q19晶格Boltzmann内核的性能分析和优化策略
3. Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU [J] . Nhat-Phuong Tran, Lee Myungho, Hong Sugwon Scientific programming . 2017,第PTa1期

机译：GPU上3D格子Boltzmann流量求解器的性能优化
4. Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs [C] . Daichi Mukunoki, Daisuke Takahashi International conference on computational science and its applications . 2013

机译：NVIDIA Kepler体系结构GPU上CRS格式的稀疏矩阵-矢量乘法的优化
5. GPU Optimizing and Accelerating Of Gibbs Ensemble On the CUDA Kepler Architecture [D] . Li, Yuanzhe 2014

机译：CUDA Kepler架构上的Gibbs Ensemble的GPU优化和加速
6. GPU linear and non-linear Poisson–Boltzmann solver module for DelPhi [O] . José Colmenares, Jesús Ortiz, Walter Rocchia -1

机译：用于DelPhi的GPU线性和非线性Poisson–Boltzmann求解器模块
7. Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs [O] . Mark J. Mawson, Alistair J. Revell 2016

机译：Kepler架构nVidia GpU上格子Boltzmann解算器的内存传输优化

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

摘要

著录项

相似文献

相关主题

期刊订阅